Overview

Dataset statistics

Number of variables27
Number of observations899164
Missing cells751259
Missing cells (%)3.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory185.2 MiB
Average record size in memory216.0 B

Variable types

Numeric8
Categorical19

Alerts

Name has a high cardinality: 779583 distinct valuesHigh cardinality
City has a high cardinality: 32581 distinct valuesHigh cardinality
State has a high cardinality: 51 distinct valuesHigh cardinality
Bank has a high cardinality: 5802 distinct valuesHigh cardinality
BankState has a high cardinality: 56 distinct valuesHigh cardinality
ApprovalDate has a high cardinality: 9859 distinct valuesHigh cardinality
ApprovalFY has a high cardinality: 52 distinct valuesHigh cardinality
ChgOffDate has a high cardinality: 6448 distinct valuesHigh cardinality
DisbursementDate has a high cardinality: 8472 distinct valuesHigh cardinality
DisbursementGross has a high cardinality: 118859 distinct valuesHigh cardinality
ChgOffPrinGr has a high cardinality: 83165 distinct valuesHigh cardinality
GrAppv has a high cardinality: 22128 distinct valuesHigh cardinality
SBA_Appv has a high cardinality: 38326 distinct valuesHigh cardinality
LoanNr_ChkDgt is highly overall correlated with ApprovalFYHigh correlation
Zip is highly overall correlated with State and 1 other fieldsHigh correlation
State is highly overall correlated with Zip and 1 other fieldsHigh correlation
BankState is highly overall correlated with Zip and 1 other fieldsHigh correlation
ApprovalFY is highly overall correlated with LoanNr_ChkDgt and 1 other fieldsHigh correlation
UrbanRural is highly overall correlated with ApprovalFYHigh correlation
RevLineCr is highly imbalanced (61.3%)Imbalance
LowDoc is highly imbalanced (80.6%)Imbalance
BalanceGross is highly imbalanced (> 99.9%)Imbalance
ChgOffPrinGr is highly imbalanced (78.8%)Imbalance
ChgOffDate has 736465 (81.9%) missing valuesMissing
NoEmp is highly skewed (γ1 = 80.24824355)Skewed
CreateJob is highly skewed (γ1 = 36.99135473)Skewed
RetainedJob is highly skewed (γ1 = 36.85481184)Skewed
LoanNr_ChkDgt has unique valuesUnique
NAICS has 201948 (22.5%) zerosZeros
CreateJob has 629248 (70.0%) zerosZeros
RetainedJob has 440403 (49.0%) zerosZeros
FranchiseCode has 208835 (23.2%) zerosZeros

Reproduction

Analysis started2023-06-09 14:31:33.729715
Analysis finished2023-06-09 14:33:30.652868
Duration1 minute and 56.92 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

LoanNr_ChkDgt
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct899164
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.7726123 × 109
Minimum1.000014 × 109
Maximum9.996003 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2023-06-09T14:33:30.989295image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1.000014 × 109
5-th percentile1.3484572 × 109
Q12.5897575 × 109
median4.361439 × 109
Q36.9046265 × 109
95-th percentile9.1648039 × 109
Maximum9.996003 × 109
Range8.995989 × 109
Interquartile range (IQR)4.314869 × 109

Descriptive statistics

Standard deviation2.538175 × 109
Coefficient of variation (CV)0.53182091
Kurtosis-1.086499
Mean4.7726123 × 109
Median Absolute Deviation (MAD)2.0134 × 109
Skewness0.3647571
Sum4.2913612 × 1015
Variance6.4423325 × 1018
MonotonicityStrictly increasing
2023-06-09T14:33:31.342689image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000014003 1
 
< 0.1%
5944984007 1
 
< 0.1%
5944874009 1
 
< 0.1%
5944884001 1
 
< 0.1%
5944904005 1
 
< 0.1%
5944914008 1
 
< 0.1%
5944924000 1
 
< 0.1%
5944934003 1
 
< 0.1%
5944944006 1
 
< 0.1%
5944954009 1
 
< 0.1%
Other values (899154) 899154
> 99.9%
ValueCountFrequency (%)
1000014003 1
< 0.1%
1000024006 1
< 0.1%
1000034009 1
< 0.1%
1000044001 1
< 0.1%
1000054004 1
< 0.1%
1000084002 1
< 0.1%
1000093009 1
< 0.1%
1000094005 1
< 0.1%
1000104006 1
< 0.1%
1000124001 1
< 0.1%
ValueCountFrequency (%)
9996003010 1
< 0.1%
9995973006 1
< 0.1%
9995613003 1
< 0.1%
9995603000 1
< 0.1%
9995573004 1
< 0.1%
9995563001 1
< 0.1%
9995493004 1
< 0.1%
9995473009 1
< 0.1%
9995453003 1
< 0.1%
9995423005 1
< 0.1%

Name
Categorical

Distinct779583
Distinct (%)86.7%
Missing14
Missing (%)< 0.1%
Memory size6.9 MiB
SUBWAY
 
1269
QUIZNO'S SUBS
 
433
COLD STONE CREAMERY
 
366
QUIZNO'S
 
345
DOMINO'S PIZZA
 
329
Other values (779578)
896408 

Length

Max length30
Median length23
Mean length21.775963
Min length1

Characters and Unicode

Total characters19579857
Distinct characters91
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique706468 ?
Unique (%)78.6%

Sample

1st rowABC HOBBYCRAFT
2nd rowLANDMARK BAR & GRILLE (THE)
3rd rowWHITLOCK DDS, TODD M.
4th rowBIG BUCKS PAWN & JEWELRY, LLC
5th rowANASTASIA CONFECTIONS, INC.

Common Values

ValueCountFrequency (%)
SUBWAY 1269
 
0.1%
QUIZNO'S SUBS 433
 
< 0.1%
COLD STONE CREAMERY 366
 
< 0.1%
QUIZNO'S 345
 
< 0.1%
DOMINO'S PIZZA 329
 
< 0.1%
DAIRY QUEEN 328
 
< 0.1%
THE UPS STORE 323
 
< 0.1%
DUNKIN DONUTS 299
 
< 0.1%
MATCO TOOLS 288
 
< 0.1%
MAIL BOXES ETC 280
 
< 0.1%
Other values (779573) 894890
99.5%

Length

2023-06-09T14:33:31.722231image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
inc 263379
 
8.4%
100280
 
3.2%
llc 77826
 
2.5%
and 28959
 
0.9%
the 28389
 
0.9%
of 23026
 
0.7%
dba 20214
 
0.6%
co 18216
 
0.6%
a 18114
 
0.6%
services 17318
 
0.6%
Other values (226643) 2530176
80.9%

Most occurring characters

ValueCountFrequency (%)
2231639
 
11.4%
E 1354056
 
6.9%
I 1226719
 
6.3%
A 1177821
 
6.0%
N 1170319
 
6.0%
R 1052562
 
5.4%
C 1038114
 
5.3%
S 1009495
 
5.2%
O 933206
 
4.8%
T 917437
 
4.7%
Other values (81) 7468489
38.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 14311292
73.1%
Lowercase Letter 2249775
 
11.5%
Space Separator 2231639
 
11.4%
Other Punctuation 712203
 
3.6%
Decimal Number 38461
 
0.2%
Dash Punctuation 29147
 
0.1%
Open Punctuation 3600
 
< 0.1%
Close Punctuation 2973
 
< 0.1%
Math Symbol 498
 
< 0.1%
Currency Symbol 198
 
< 0.1%
Other values (2) 71
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 1354056
 
9.5%
I 1226719
 
8.6%
A 1177821
 
8.2%
N 1170319
 
8.2%
R 1052562
 
7.4%
C 1038114
 
7.3%
S 1009495
 
7.1%
O 933206
 
6.5%
T 917437
 
6.4%
L 840208
 
5.9%
Other values (16) 3591355
25.1%
Lowercase Letter
ValueCountFrequency (%)
e 250402
11.1%
n 238175
10.6%
a 206694
9.2%
r 187739
 
8.3%
i 180961
 
8.0%
o 178702
 
7.9%
t 151259
 
6.7%
s 141102
 
6.3%
c 123850
 
5.5%
l 107780
 
4.8%
Other values (16) 483111
21.5%
Other Punctuation
ValueCountFrequency (%)
. 273453
38.4%
, 244641
34.3%
& 104166
 
14.6%
' 73757
 
10.4%
/ 10119
 
1.4%
# 3514
 
0.5%
" 906
 
0.1%
! 473
 
0.1%
: 411
 
0.1%
* 244
 
< 0.1%
Other values (5) 519
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 7572
19.7%
2 6295
16.4%
0 4730
12.3%
3 3993
10.4%
4 3678
9.6%
5 2715
 
7.1%
8 2585
 
6.7%
6 2467
 
6.4%
7 2234
 
5.8%
9 2192
 
5.7%
Math Symbol
ValueCountFrequency (%)
+ 468
94.0%
= 16
 
3.2%
> 9
 
1.8%
< 5
 
1.0%
Open Punctuation
ValueCountFrequency (%)
( 3597
99.9%
[ 3
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 2972
> 99.9%
] 1
 
< 0.1%
Modifier Symbol
ValueCountFrequency (%)
` 64
94.1%
^ 4
 
5.9%
Space Separator
ValueCountFrequency (%)
2231639
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 29147
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 198
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16561067
84.6%
Common 3018790
 
15.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 1354056
 
8.2%
I 1226719
 
7.4%
A 1177821
 
7.1%
N 1170319
 
7.1%
R 1052562
 
6.4%
C 1038114
 
6.3%
S 1009495
 
6.1%
O 933206
 
5.6%
T 917437
 
5.5%
L 840208
 
5.1%
Other values (42) 5841130
35.3%
Common
ValueCountFrequency (%)
2231639
73.9%
. 273453
 
9.1%
, 244641
 
8.1%
& 104166
 
3.5%
' 73757
 
2.4%
- 29147
 
1.0%
/ 10119
 
0.3%
1 7572
 
0.3%
2 6295
 
0.2%
0 4730
 
0.2%
Other values (29) 33271
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19579857
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2231639
 
11.4%
E 1354056
 
6.9%
I 1226719
 
6.3%
A 1177821
 
6.0%
N 1170319
 
6.0%
R 1052562
 
5.4%
C 1038114
 
5.3%
S 1009495
 
5.2%
O 933206
 
4.8%
T 917437
 
4.7%
Other values (81) 7468489
38.1%

City
Categorical

Distinct32581
Distinct (%)3.6%
Missing30
Missing (%)< 0.1%
Memory size6.9 MiB
LOS ANGELES
 
11558
HOUSTON
 
10247
NEW YORK
 
7846
CHICAGO
 
6036
MIAMI
 
5594
Other values (32576)
857853 

Length

Max length30
Median length27
Mean length9.1030625
Min length1

Characters and Unicode

Total characters8184873
Distinct characters80
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12872 ?
Unique (%)1.4%

Sample

1st rowEVANSVILLE
2nd rowNEW PARIS
3rd rowBLOOMINGTON
4th rowBROKEN ARROW
5th rowORLANDO

Common Values

ValueCountFrequency (%)
LOS ANGELES 11558
 
1.3%
HOUSTON 10247
 
1.1%
NEW YORK 7846
 
0.9%
CHICAGO 6036
 
0.7%
MIAMI 5594
 
0.6%
SAN DIEGO 5363
 
0.6%
DALLAS 5085
 
0.6%
PHOENIX 4493
 
0.5%
LAS VEGAS 4390
 
0.5%
SPRINGFIELD 3738
 
0.4%
Other values (32571) 834784
92.8%

Length

2023-06-09T14:33:32.067888image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city 23831
 
2.0%
san 21942
 
1.8%
new 16075
 
1.3%
los 13000
 
1.1%
angeles 12380
 
1.0%
lake 10729
 
0.9%
houston 10587
 
0.9%
beach 10462
 
0.9%
park 10316
 
0.9%
york 9724
 
0.8%
Other values (17695) 1066583
88.5%

Most occurring characters

ValueCountFrequency (%)
A 744405
 
9.1%
E 723098
 
8.8%
O 632510
 
7.7%
N 621338
 
7.6%
L 573578
 
7.0%
R 513614
 
6.3%
S 475392
 
5.8%
I 468344
 
5.7%
T 425108
 
5.2%
306936
 
3.8%
Other values (70) 2700550
33.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 7442897
90.9%
Lowercase Letter 398062
 
4.9%
Space Separator 306936
 
3.8%
Open Punctuation 14884
 
0.2%
Other Punctuation 11120
 
0.1%
Close Punctuation 9119
 
0.1%
Dash Punctuation 946
 
< 0.1%
Decimal Number 870
 
< 0.1%
Modifier Symbol 39
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 744405
 
10.0%
E 723098
 
9.7%
O 632510
 
8.5%
N 621338
 
8.3%
L 573578
 
7.7%
R 513614
 
6.9%
S 475392
 
6.4%
I 468344
 
6.3%
T 425108
 
5.7%
C 262549
 
3.5%
Other values (16) 2002961
26.9%
Lowercase Letter
ValueCountFrequency (%)
e 43411
10.9%
a 41550
10.4%
n 36545
9.2%
o 36384
9.1%
l 32699
 
8.2%
i 30470
 
7.7%
r 29637
 
7.4%
t 24529
 
6.2%
s 21884
 
5.5%
d 12360
 
3.1%
Other values (16) 88593
22.3%
Other Punctuation
ValueCountFrequency (%)
. 8672
78.0%
, 1215
 
10.9%
' 1134
 
10.2%
: 29
 
0.3%
& 22
 
0.2%
/ 21
 
0.2%
; 18
 
0.2%
# 5
 
< 0.1%
@ 2
 
< 0.1%
* 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 153
17.6%
1 145
16.7%
2 113
13.0%
5 90
10.3%
4 86
9.9%
3 78
9.0%
6 63
7.2%
9 51
 
5.9%
8 49
 
5.6%
7 42
 
4.8%
Open Punctuation
ValueCountFrequency (%)
( 14879
> 99.9%
[ 5
 
< 0.1%
Modifier Symbol
ValueCountFrequency (%)
` 38
97.4%
^ 1
 
2.6%
Space Separator
ValueCountFrequency (%)
306936
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9119
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 946
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7840959
95.8%
Common 343914
 
4.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 744405
 
9.5%
E 723098
 
9.2%
O 632510
 
8.1%
N 621338
 
7.9%
L 573578
 
7.3%
R 513614
 
6.6%
S 475392
 
6.1%
I 468344
 
6.0%
T 425108
 
5.4%
C 262549
 
3.3%
Other values (42) 2401023
30.6%
Common
ValueCountFrequency (%)
306936
89.2%
( 14879
 
4.3%
) 9119
 
2.7%
. 8672
 
2.5%
, 1215
 
0.4%
' 1134
 
0.3%
- 946
 
0.3%
0 153
 
< 0.1%
1 145
 
< 0.1%
2 113
 
< 0.1%
Other values (18) 602
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8184873
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 744405
 
9.1%
E 723098
 
8.8%
O 632510
 
7.7%
N 621338
 
7.6%
L 573578
 
7.0%
R 513614
 
6.3%
S 475392
 
5.8%
I 468344
 
5.7%
T 425108
 
5.2%
306936
 
3.8%
Other values (70) 2700550
33.0%

State
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct51
Distinct (%)< 0.1%
Missing14
Missing (%)< 0.1%
Memory size6.9 MiB
CA
130619 
TX
70458 
NY
57693 
FL
 
41212
PA
 
35170
Other values (46)
563998 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1798300
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIN
2nd rowIN
3rd rowIN
4th rowOK
5th rowFL

Common Values

ValueCountFrequency (%)
CA 130619
 
14.5%
TX 70458
 
7.8%
NY 57693
 
6.4%
FL 41212
 
4.6%
PA 35170
 
3.9%
OH 32622
 
3.6%
IL 29669
 
3.3%
MA 25272
 
2.8%
MN 24373
 
2.7%
NJ 24035
 
2.7%
Other values (41) 428027
47.6%

Length

2023-06-09T14:33:32.338093image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca 130619
 
14.5%
tx 70458
 
7.8%
ny 57693
 
6.4%
fl 41212
 
4.6%
pa 35170
 
3.9%
oh 32622
 
3.6%
il 29669
 
3.3%
ma 25272
 
2.8%
mn 24373
 
2.7%
nj 24035
 
2.7%
Other values (41) 428027
47.6%

Most occurring characters

ValueCountFrequency (%)
A 306176
17.0%
C 184957
10.3%
N 181727
10.1%
M 132549
 
7.4%
T 125069
 
7.0%
I 119518
 
6.6%
O 94906
 
5.3%
L 88819
 
4.9%
X 70458
 
3.9%
Y 68255
 
3.8%
Other values (14) 425866
23.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1798300
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 306176
17.0%
C 184957
10.3%
N 181727
10.1%
M 132549
 
7.4%
T 125069
 
7.0%
I 119518
 
6.6%
O 94906
 
5.3%
L 88819
 
4.9%
X 70458
 
3.9%
Y 68255
 
3.8%
Other values (14) 425866
23.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 1798300
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 306176
17.0%
C 184957
10.3%
N 181727
10.1%
M 132549
 
7.4%
T 125069
 
7.0%
I 119518
 
6.6%
O 94906
 
5.3%
L 88819
 
4.9%
X 70458
 
3.9%
Y 68255
 
3.8%
Other values (14) 425866
23.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1798300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 306176
17.0%
C 184957
10.3%
N 181727
10.1%
M 132549
 
7.4%
T 125069
 
7.0%
I 119518
 
6.6%
O 94906
 
5.3%
L 88819
 
4.9%
X 70458
 
3.9%
Y 68255
 
3.8%
Other values (14) 425866
23.7%

Zip
Real number (ℝ)

Distinct33611
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53804.391
Minimum0
Maximum99999
Zeros283
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2023-06-09T14:33:32.593936image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3838
Q127587
median55410
Q383704
95-th percentile95822
Maximum99999
Range99999
Interquartile range (IQR)56117

Descriptive statistics

Standard deviation31184.159
Coefficient of variation (CV)0.5795839
Kurtosis-1.3359893
Mean53804.391
Median Absolute Deviation (MAD)28206
Skewness-0.16816663
Sum4.8378972 × 1010
Variance9.7245178 × 108
MonotonicityNot monotonic
2023-06-09T14:33:32.881119image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10001 933
 
0.1%
90015 926
 
0.1%
93401 806
 
0.1%
90010 733
 
0.1%
33166 671
 
0.1%
90021 666
 
0.1%
59601 640
 
0.1%
65804 599
 
0.1%
3801 581
 
0.1%
59101 578
 
0.1%
Other values (33601) 892031
99.2%
ValueCountFrequency (%)
0 283
< 0.1%
1 24
 
< 0.1%
2 11
 
< 0.1%
3 5
 
< 0.1%
4 5
 
< 0.1%
5 5
 
< 0.1%
6 4
 
< 0.1%
7 6
 
< 0.1%
8 15
 
< 0.1%
9 24
 
< 0.1%
ValueCountFrequency (%)
99999 209
< 0.1%
99950 3
 
< 0.1%
99929 15
 
< 0.1%
99928 1
 
< 0.1%
99926 1
 
< 0.1%
99925 4
 
< 0.1%
99923 1
 
< 0.1%
99921 13
 
< 0.1%
99919 2
 
< 0.1%
99918 1
 
< 0.1%

Bank
Categorical

Distinct5802
Distinct (%)0.6%
Missing1559
Missing (%)0.2%
Memory size6.9 MiB
BANK OF AMERICA NATL ASSOC
86853 
WELLS FARGO BANK NATL ASSOC
63503 
JPMORGAN CHASE BANK NATL ASSOC
 
48167
U.S. BANK NATIONAL ASSOCIATION
 
35143
CITIZENS BANK NATL ASSOC
 
35054
Other values (5797)
628885 

Length

Max length30
Median length26
Mean length23.187946
Min length3

Characters and Unicode

Total characters20813616
Distinct characters50
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique923 ?
Unique (%)0.1%

Sample

1st rowFIFTH THIRD BANK
2nd row1ST SOURCE BANK
3rd rowGRANT COUNTY STATE BANK
4th row1ST NATL BK & TR CO OF BROKEN
5th rowFLORIDA BUS. DEVEL CORP

Common Values

ValueCountFrequency (%)
BANK OF AMERICA NATL ASSOC 86853
 
9.7%
WELLS FARGO BANK NATL ASSOC 63503
 
7.1%
JPMORGAN CHASE BANK NATL ASSOC 48167
 
5.4%
U.S. BANK NATIONAL ASSOCIATION 35143
 
3.9%
CITIZENS BANK NATL ASSOC 35054
 
3.9%
PNC BANK, NATIONAL ASSOCIATION 27351
 
3.0%
BBCN BANK 22978
 
2.6%
CAPITAL ONE NATL ASSOC 22248
 
2.5%
MANUFACTURERS & TRADERS TR CO 11265
 
1.3%
READYCAP LENDING, LLC 10664
 
1.2%
Other values (5792) 534379
59.4%

Length

2023-06-09T14:33:33.243947image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bank 651608
18.5%
natl 318240
 
9.0%
assoc 306768
 
8.7%
of 142852
 
4.1%
national 125899
 
3.6%
america 100686
 
2.9%
association 84965
 
2.4%
fargo 63732
 
1.8%
wells 63650
 
1.8%
52264
 
1.5%
Other values (3602) 1606709
45.7%

Most occurring characters

ValueCountFrequency (%)
A 2762231
13.3%
2620014
12.6%
N 2105500
10.1%
S 1520499
 
7.3%
O 1336993
 
6.4%
T 1181841
 
5.7%
C 1134642
 
5.5%
I 1061717
 
5.1%
E 923739
 
4.4%
L 922583
 
4.4%
Other values (40) 5243857
25.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 17830764
85.7%
Space Separator 2620014
 
12.6%
Other Punctuation 341354
 
1.6%
Dash Punctuation 10861
 
0.1%
Decimal Number 9482
 
< 0.1%
Open Punctuation 584
 
< 0.1%
Close Punctuation 555
 
< 0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 2762231
15.5%
N 2105500
11.8%
S 1520499
 
8.5%
O 1336993
 
7.5%
T 1181841
 
6.6%
C 1134642
 
6.4%
I 1061717
 
6.0%
E 923739
 
5.2%
L 922583
 
5.2%
B 893994
 
5.0%
Other values (16) 3987025
22.4%
Decimal Number
ValueCountFrequency (%)
1 5538
58.4%
5 1268
 
13.4%
0 1258
 
13.3%
4 1222
 
12.9%
2 112
 
1.2%
7 33
 
0.3%
3 24
 
0.3%
9 17
 
0.2%
8 7
 
0.1%
6 3
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 192998
56.5%
, 94677
27.7%
& 50021
 
14.7%
/ 1833
 
0.5%
' 1811
 
0.5%
: 10
 
< 0.1%
# 2
 
< 0.1%
* 1
 
< 0.1%
% 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2620014
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 10861
100.0%
Open Punctuation
ValueCountFrequency (%)
( 584
100.0%
Close Punctuation
ValueCountFrequency (%)
) 555
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17830764
85.7%
Common 2982852
 
14.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 2762231
15.5%
N 2105500
11.8%
S 1520499
 
8.5%
O 1336993
 
7.5%
T 1181841
 
6.6%
C 1134642
 
6.4%
I 1061717
 
6.0%
E 923739
 
5.2%
L 922583
 
5.2%
B 893994
 
5.0%
Other values (16) 3987025
22.4%
Common
ValueCountFrequency (%)
2620014
87.8%
. 192998
 
6.5%
, 94677
 
3.2%
& 50021
 
1.7%
- 10861
 
0.4%
1 5538
 
0.2%
/ 1833
 
0.1%
' 1811
 
0.1%
5 1268
 
< 0.1%
0 1258
 
< 0.1%
Other values (14) 2573
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20813616
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 2762231
13.3%
2620014
12.6%
N 2105500
10.1%
S 1520499
 
7.3%
O 1336993
 
6.4%
T 1181841
 
5.7%
C 1134642
 
5.5%
I 1061717
 
5.1%
E 923739
 
4.4%
L 922583
 
4.4%
Other values (40) 5243857
25.2%

BankState
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct56
Distinct (%)< 0.1%
Missing1566
Missing (%)0.2%
Memory size6.9 MiB
CA
118116 
NC
79514 
IL
65908 
OH
58461 
SD
 
51095
Other values (51)
524504 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1795196
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowOH
2nd rowIN
3rd rowIN
4th rowOK
5th rowFL

Common Values

ValueCountFrequency (%)
CA 118116
 
13.1%
NC 79514
 
8.8%
IL 65908
 
7.3%
OH 58461
 
6.5%
SD 51095
 
5.7%
TX 47790
 
5.3%
RI 45366
 
5.0%
NY 39592
 
4.4%
VA 29002
 
3.2%
DE 24537
 
2.7%
Other values (46) 338217
37.6%

Length

2023-06-09T14:33:33.547919image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca 118116
 
13.2%
nc 79514
 
8.9%
il 65908
 
7.3%
oh 58461
 
6.5%
sd 51095
 
5.7%
tx 47790
 
5.3%
ri 45366
 
5.1%
ny 39592
 
4.4%
va 29002
 
3.2%
de 24537
 
2.7%
Other values (46) 338217
37.7%

Most occurring characters

ValueCountFrequency (%)
A 241398
13.4%
C 229604
12.8%
N 187751
10.5%
I 158854
 
8.8%
O 102604
 
5.7%
L 96914
 
5.4%
D 96078
 
5.4%
T 94941
 
5.3%
M 85034
 
4.7%
S 73385
 
4.1%
Other values (14) 428633
23.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1795196
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 241398
13.4%
C 229604
12.8%
N 187751
10.5%
I 158854
 
8.8%
O 102604
 
5.7%
L 96914
 
5.4%
D 96078
 
5.4%
T 94941
 
5.3%
M 85034
 
4.7%
S 73385
 
4.1%
Other values (14) 428633
23.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 1795196
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 241398
13.4%
C 229604
12.8%
N 187751
10.5%
I 158854
 
8.8%
O 102604
 
5.7%
L 96914
 
5.4%
D 96078
 
5.4%
T 94941
 
5.3%
M 85034
 
4.7%
S 73385
 
4.1%
Other values (14) 428633
23.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1795196
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 241398
13.4%
C 229604
12.8%
N 187751
10.5%
I 158854
 
8.8%
O 102604
 
5.7%
L 96914
 
5.4%
D 96078
 
5.4%
T 94941
 
5.3%
M 85034
 
4.7%
S 73385
 
4.1%
Other values (14) 428633
23.9%

NAICS
Real number (ℝ)

Distinct1312
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean398660.95
Minimum0
Maximum928120
Zeros201948
Zeros (%)22.5%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2023-06-09T14:33:33.828932image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1235210
median445310
Q3561730
95-th percentile811192
Maximum928120
Range928120
Interquartile range (IQR)326520

Descriptive statistics

Standard deviation263318.31
Coefficient of variation (CV)0.66050691
Kurtosis-1.0476526
Mean398660.95
Median Absolute Deviation (MAD)176300
Skewness-0.26287834
Sum3.5846157 × 1011
Variance6.9336534 × 1010
MonotonicityNot monotonic
2023-06-09T14:33:34.173279image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 201948
 
22.5%
722110 27989
 
3.1%
722211 19448
 
2.2%
811111 14585
 
1.6%
621210 14048
 
1.6%
624410 10111
 
1.1%
812112 9230
 
1.0%
561730 8935
 
1.0%
621310 8733
 
1.0%
812320 7894
 
0.9%
Other values (1302) 576243
64.1%
ValueCountFrequency (%)
0 201948
22.5%
111110 32
 
< 0.1%
111120 3
 
< 0.1%
111130 1
 
< 0.1%
111140 94
 
< 0.1%
111150 49
 
< 0.1%
111160 2
 
< 0.1%
111191 3
 
< 0.1%
111199 7
 
< 0.1%
111211 16
 
< 0.1%
ValueCountFrequency (%)
928120 32
< 0.1%
928110 4
 
< 0.1%
927110 1
 
< 0.1%
926150 10
 
< 0.1%
926140 6
 
< 0.1%
926130 3
 
< 0.1%
926120 5
 
< 0.1%
926110 6
 
< 0.1%
925120 1
 
< 0.1%
925110 3
 
< 0.1%

ApprovalDate
Categorical

Distinct9859
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
7-Jul-93
 
1131
30-Jan-04
 
1032
8-Jul-93
 
780
4-Oct-04
 
658
30-Sep-03
 
608
Other values (9854)
894955 

Length

Max length9
Median length9
Mean length8.7211399
Min length8

Characters and Unicode

Total characters7841735
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique952 ?
Unique (%)0.1%

Sample

1st row28-Feb-97
2nd row28-Feb-97
3rd row28-Feb-97
4th row28-Feb-97
5th row28-Feb-97

Common Values

ValueCountFrequency (%)
7-Jul-93 1131
 
0.1%
30-Jan-04 1032
 
0.1%
8-Jul-93 780
 
0.1%
4-Oct-04 658
 
0.1%
30-Sep-03 608
 
0.1%
30-Jun-05 572
 
0.1%
18-Apr-05 534
 
0.1%
6-Jul-93 523
 
0.1%
21-Jan-05 498
 
0.1%
27-Sep-02 497
 
0.1%
Other values (9849) 892331
99.2%

Length

2023-06-09T14:33:34.499106image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7-jul-93 1131
 
0.1%
30-jan-04 1032
 
0.1%
8-jul-93 780
 
0.1%
4-oct-04 658
 
0.1%
30-sep-03 608
 
0.1%
30-jun-05 572
 
0.1%
18-apr-05 534
 
0.1%
6-jul-93 523
 
0.1%
21-jan-05 498
 
0.1%
27-sep-02 497
 
0.1%
Other values (9849) 892331
99.2%

Most occurring characters

ValueCountFrequency (%)
- 1798328
22.9%
0 687310
 
8.8%
1 492781
 
6.3%
9 470677
 
6.0%
2 464364
 
5.9%
u 233553
 
3.0%
3 229057
 
2.9%
a 227906
 
2.9%
J 221861
 
2.8%
e 219341
 
2.8%
Other values (23) 2796557
35.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3345915
42.7%
Dash Punctuation 1798328
22.9%
Lowercase Letter 1798328
22.9%
Uppercase Letter 899164
 
11.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u 233553
13.0%
a 227906
12.7%
e 219341
12.2%
r 163835
9.1%
p 163275
9.1%
n 145374
8.1%
c 139688
7.8%
g 78776
 
4.4%
y 77194
 
4.3%
l 76487
 
4.3%
Other values (4) 272899
15.2%
Decimal Number
ValueCountFrequency (%)
0 687310
20.5%
1 492781
14.7%
9 470677
14.1%
2 464364
13.9%
3 229057
 
6.8%
6 208904
 
6.2%
5 203699
 
6.1%
7 199006
 
5.9%
4 197260
 
5.9%
8 192857
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
J 221861
24.7%
M 160822
17.9%
A 158983
17.7%
S 83068
 
9.2%
D 69931
 
7.8%
O 69757
 
7.8%
N 68400
 
7.6%
F 66342
 
7.4%
Dash Punctuation
ValueCountFrequency (%)
- 1798328
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5144243
65.6%
Latin 2697492
34.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
u 233553
 
8.7%
a 227906
 
8.4%
J 221861
 
8.2%
e 219341
 
8.1%
r 163835
 
6.1%
p 163275
 
6.1%
M 160822
 
6.0%
A 158983
 
5.9%
n 145374
 
5.4%
c 139688
 
5.2%
Other values (12) 862854
32.0%
Common
ValueCountFrequency (%)
- 1798328
35.0%
0 687310
 
13.4%
1 492781
 
9.6%
9 470677
 
9.1%
2 464364
 
9.0%
3 229057
 
4.5%
6 208904
 
4.1%
5 203699
 
4.0%
7 199006
 
3.9%
4 197260
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7841735
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 1798328
22.9%
0 687310
 
8.8%
1 492781
 
6.3%
9 470677
 
6.0%
2 464364
 
5.9%
u 233553
 
3.0%
3 229057
 
2.9%
a 227906
 
2.9%
J 221861
 
2.8%
e 219341
 
2.8%
Other values (23) 2796557
35.7%

ApprovalFY
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct52
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
2005
77525 
2006
76040 
2007
71876 
2004
68290 
2003
58193 
Other values (47)
547240 

Length

Max length5
Median length4
Mean length4.00002
Min length4

Characters and Unicode

Total characters3596674
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row1997
2nd row1997
3rd row1997
4th row1997
5th row1997

Common Values

ValueCountFrequency (%)
2005 77525
 
8.6%
2006 76040
 
8.5%
2007 71876
 
8.0%
2004 68290
 
7.6%
2003 58193
 
6.5%
1995 45758
 
5.1%
2002 44391
 
4.9%
1996 40112
 
4.5%
2008 39540
 
4.4%
1997 37748
 
4.2%
Other values (42) 339691
37.8%

Length

2023-06-09T14:33:34.736177image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2005 77525
 
8.6%
2006 76040
 
8.5%
2007 71876
 
8.0%
2004 68290
 
7.6%
2003 58193
 
6.5%
1995 45758
 
5.1%
2002 44391
 
4.9%
1996 40112
 
4.5%
2008 39540
 
4.4%
1997 37748
 
4.2%
Other values (42) 339691
37.8%

Most occurring characters

ValueCountFrequency (%)
0 1167176
32.5%
9 704676
19.6%
2 639911
17.8%
1 435726
 
12.1%
5 125258
 
3.5%
6 118366
 
3.3%
7 112975
 
3.1%
8 104656
 
2.9%
4 102220
 
2.8%
3 85692
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3596656
> 99.9%
Uppercase Letter 18
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1167176
32.5%
9 704676
19.6%
2 639911
17.8%
1 435726
 
12.1%
5 125258
 
3.5%
6 118366
 
3.3%
7 112975
 
3.1%
8 104656
 
2.9%
4 102220
 
2.8%
3 85692
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
A 18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3596656
> 99.9%
Latin 18
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1167176
32.5%
9 704676
19.6%
2 639911
17.8%
1 435726
 
12.1%
5 125258
 
3.5%
6 118366
 
3.3%
7 112975
 
3.1%
8 104656
 
2.9%
4 102220
 
2.8%
3 85692
 
2.4%
Latin
ValueCountFrequency (%)
A 18
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3596674
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1167176
32.5%
9 704676
19.6%
2 639911
17.8%
1 435726
 
12.1%
5 125258
 
3.5%
6 118366
 
3.3%
7 112975
 
3.1%
8 104656
 
2.9%
4 102220
 
2.8%
3 85692
 
2.4%

Term
Real number (ℝ)

Distinct412
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean110.77308
Minimum0
Maximum569
Zeros810
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2023-06-09T14:33:34.987189image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile16
Q160
median84
Q3120
95-th percentile300
Maximum569
Range569
Interquartile range (IQR)60

Descriptive statistics

Standard deviation78.857305
Coefficient of variation (CV)0.7118815
Kurtosis0.18570424
Mean110.77308
Median Absolute Deviation (MAD)33
Skewness1.1209258
Sum99603164
Variance6218.4746
MonotonicityNot monotonic
2023-06-09T14:33:35.252017image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
84 230162
25.6%
60 89945
 
10.0%
240 85982
 
9.6%
120 77654
 
8.6%
300 44727
 
5.0%
180 28164
 
3.1%
36 19800
 
2.2%
12 17095
 
1.9%
48 15621
 
1.7%
72 9419
 
1.0%
Other values (402) 280595
31.2%
ValueCountFrequency (%)
0 810
 
0.1%
1 1608
0.2%
2 1809
0.2%
3 2112
0.2%
4 2173
0.2%
5 1866
0.2%
6 3054
0.3%
7 1761
0.2%
8 1693
0.2%
9 1875
0.2%
ValueCountFrequency (%)
569 1
< 0.1%
527 1
< 0.1%
511 1
< 0.1%
505 1
< 0.1%
481 1
< 0.1%
480 1
< 0.1%
461 1
< 0.1%
449 1
< 0.1%
445 1
< 0.1%
443 1
< 0.1%

NoEmp
Real number (ℝ)

Distinct599
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.411353
Minimum0
Maximum9999
Zeros6631
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2023-06-09T14:33:35.541589image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median4
Q310
95-th percentile40
Maximum9999
Range9999
Interquartile range (IQR)8

Descriptive statistics

Standard deviation74.108196
Coefficient of variation (CV)6.4942514
Kurtosis7965.2886
Mean11.411353
Median Absolute Deviation (MAD)3
Skewness80.248244
Sum10260678
Variance5492.0248
MonotonicityNot monotonic
2023-06-09T14:33:35.826227image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 154254
17.2%
2 138297
15.4%
3 90674
10.1%
4 73644
 
8.2%
5 60319
 
6.7%
6 45759
 
5.1%
10 31536
 
3.5%
7 31495
 
3.5%
8 31361
 
3.5%
12 20822
 
2.3%
Other values (589) 221003
24.6%
ValueCountFrequency (%)
0 6631
 
0.7%
1 154254
17.2%
2 138297
15.4%
3 90674
10.1%
4 73644
8.2%
5 60319
 
6.7%
6 45759
 
5.1%
7 31495
 
3.5%
8 31361
 
3.5%
9 18131
 
2.0%
ValueCountFrequency (%)
9999 4
< 0.1%
9992 1
 
< 0.1%
9945 1
 
< 0.1%
9090 1
 
< 0.1%
9000 2
 
< 0.1%
8500 1
 
< 0.1%
8041 1
 
< 0.1%
8018 1
 
< 0.1%
8000 7
< 0.1%
7999 1
 
< 0.1%

NewExist
Categorical

Distinct3
Distinct (%)< 0.1%
Missing136
Missing (%)< 0.1%
Memory size6.9 MiB
1.0
644869 
2.0
253125 
0.0
 
1034

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2697084
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 644869
71.7%
2.0 253125
 
28.2%
0.0 1034
 
0.1%
(Missing) 136
 
< 0.1%

Length

2023-06-09T14:33:36.044244image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-09T14:33:36.744411image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0 644869
71.7%
2.0 253125
 
28.2%
0.0 1034
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 900062
33.4%
. 899028
33.3%
1 644869
23.9%
2 253125
 
9.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1798056
66.7%
Other Punctuation 899028
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 900062
50.1%
1 644869
35.9%
2 253125
 
14.1%
Other Punctuation
ValueCountFrequency (%)
. 899028
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2697084
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 900062
33.4%
. 899028
33.3%
1 644869
23.9%
2 253125
 
9.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2697084
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 900062
33.4%
. 899028
33.3%
1 644869
23.9%
2 253125
 
9.4%

CreateJob
Real number (ℝ)

SKEWED  ZEROS 

Distinct246
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.4303764
Minimum0
Maximum8800
Zeros629248
Zeros (%)70.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2023-06-09T14:33:36.953484image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile10
Maximum8800
Range8800
Interquartile range (IQR)1

Descriptive statistics

Standard deviation236.68817
Coefficient of variation (CV)28.075634
Kurtosis1369.911
Mean8.4303764
Median Absolute Deviation (MAD)0
Skewness36.991355
Sum7580291
Variance56021.288
MonotonicityNot monotonic
2023-06-09T14:33:37.148403image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 629248
70.0%
1 63174
 
7.0%
2 57831
 
6.4%
3 28806
 
3.2%
4 20511
 
2.3%
5 18691
 
2.1%
10 11602
 
1.3%
6 11009
 
1.2%
8 7378
 
0.8%
7 6374
 
0.7%
Other values (236) 44540
 
5.0%
ValueCountFrequency (%)
0 629248
70.0%
1 63174
 
7.0%
2 57831
 
6.4%
3 28806
 
3.2%
4 20511
 
2.3%
5 18691
 
2.1%
6 11009
 
1.2%
7 6374
 
0.7%
8 7378
 
0.8%
9 3330
 
0.4%
ValueCountFrequency (%)
8800 648
0.1%
5621 1
 
< 0.1%
5199 1
 
< 0.1%
5085 1
 
< 0.1%
3500 1
 
< 0.1%
3100 1
 
< 0.1%
3000 4
 
< 0.1%
2515 1
 
< 0.1%
2140 1
 
< 0.1%
2020 1
 
< 0.1%

RetainedJob
Real number (ℝ)

SKEWED  ZEROS 

Distinct358
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.797257
Minimum0
Maximum9500
Zeros440403
Zeros (%)49.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2023-06-09T14:33:37.554869image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile20
Maximum9500
Range9500
Interquartile range (IQR)4

Descriptive statistics

Standard deviation237.1206
Coefficient of variation (CV)21.961188
Kurtosis1362.0182
Mean10.797257
Median Absolute Deviation (MAD)1
Skewness36.854812
Sum9708505
Variance56226.179
MonotonicityNot monotonic
2023-06-09T14:33:37.860696image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 440403
49.0%
1 88790
 
9.9%
2 76851
 
8.5%
3 49963
 
5.6%
4 39666
 
4.4%
5 32627
 
3.6%
6 23796
 
2.6%
7 16530
 
1.8%
8 15698
 
1.7%
10 15438
 
1.7%
Other values (348) 99402
 
11.1%
ValueCountFrequency (%)
0 440403
49.0%
1 88790
 
9.9%
2 76851
 
8.5%
3 49963
 
5.6%
4 39666
 
4.4%
5 32627
 
3.6%
6 23796
 
2.6%
7 16530
 
1.8%
8 15698
 
1.7%
9 8735
 
1.0%
ValueCountFrequency (%)
9500 1
 
< 0.1%
8800 648
0.1%
7250 1
 
< 0.1%
5000 1
 
< 0.1%
4441 1
 
< 0.1%
4000 2
 
< 0.1%
3900 1
 
< 0.1%
3860 1
 
< 0.1%
3225 1
 
< 0.1%
3200 1
 
< 0.1%

FranchiseCode
Real number (ℝ)

Distinct2768
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2753.7259
Minimum0
Maximum99999
Zeros208835
Zeros (%)23.2%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2023-06-09T14:33:38.183520image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile15805
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12758.019
Coefficient of variation (CV)4.6330025
Kurtosis24.409524
Mean2753.7259
Median Absolute Deviation (MAD)0
Skewness4.9752152
Sum2.4760512 × 109
Variance1.6276705 × 108
MonotonicityNot monotonic
2023-06-09T14:33:38.499223image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 638554
71.0%
0 208835
 
23.2%
78760 3373
 
0.4%
68020 1921
 
0.2%
50564 1034
 
0.1%
21780 1003
 
0.1%
25650 715
 
0.1%
79140 659
 
0.1%
22470 615
 
0.1%
17998 606
 
0.1%
Other values (2758) 41849
 
4.7%
ValueCountFrequency (%)
0 208835
 
23.2%
1 638554
71.0%
3 12
 
< 0.1%
395 5
 
< 0.1%
399 3
 
< 0.1%
400 2
 
< 0.1%
401 12
 
< 0.1%
404 1
 
< 0.1%
407 34
 
< 0.1%
414 2
 
< 0.1%
ValueCountFrequency (%)
99999 1
 
< 0.1%
92006 4
 
< 0.1%
92000 9
< 0.1%
91999 11
< 0.1%
91450 2
 
< 0.1%
91446 1
 
< 0.1%
91443 2
 
< 0.1%
91435 1
 
< 0.1%
91424 1
 
< 0.1%
91423 2
 
< 0.1%

UrbanRural
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
1
470654 
0
323167 
2
105343 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters899164
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
1 470654
52.3%
0 323167
35.9%
2 105343
 
11.7%

Length

2023-06-09T14:33:38.780546image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-09T14:33:39.063532image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1 470654
52.3%
0 323167
35.9%
2 105343
 
11.7%

Most occurring characters

ValueCountFrequency (%)
1 470654
52.3%
0 323167
35.9%
2 105343
 
11.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 899164
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 470654
52.3%
0 323167
35.9%
2 105343
 
11.7%

Most occurring scripts

ValueCountFrequency (%)
Common 899164
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 470654
52.3%
0 323167
35.9%
2 105343
 
11.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 899164
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 470654
52.3%
0 323167
35.9%
2 105343
 
11.7%

RevLineCr
Categorical

Distinct18
Distinct (%)< 0.1%
Missing4528
Missing (%)0.5%
Memory size6.9 MiB
N
420288 
0
257602 
Y
201397 
T
 
15284
1
 
23
Other values (13)
 
42

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters894636
Distinct characters18
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowN
2nd rowN
3rd rowN
4th rowN
5th rowN

Common Values

ValueCountFrequency (%)
N 420288
46.7%
0 257602
28.6%
Y 201397
22.4%
T 15284
 
1.7%
1 23
 
< 0.1%
R 14
 
< 0.1%
` 11
 
< 0.1%
2 6
 
< 0.1%
C 2
 
< 0.1%
5 1
 
< 0.1%
Other values (8) 8
 
< 0.1%
(Missing) 4528
 
0.5%

Length

2023-06-09T14:33:39.299601image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
n 420288
47.0%
0 257602
28.8%
y 201397
22.5%
t 15284
 
1.7%
1 23
 
< 0.1%
r 14
 
< 0.1%
14
 
< 0.1%
2 6
 
< 0.1%
c 2
 
< 0.1%
5 1
 
< 0.1%
Other values (5) 5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N 420288
47.0%
0 257602
28.8%
Y 201397
22.5%
T 15284
 
1.7%
1 23
 
< 0.1%
R 14
 
< 0.1%
` 11
 
< 0.1%
2 6
 
< 0.1%
C 2
 
< 0.1%
3 1
 
< 0.1%
Other values (8) 8
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 636987
71.2%
Decimal Number 257635
28.8%
Modifier Symbol 11
 
< 0.1%
Other Punctuation 2
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 420288
66.0%
Y 201397
31.6%
T 15284
 
2.4%
R 14
 
< 0.1%
C 2
 
< 0.1%
A 1
 
< 0.1%
Q 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 257602
> 99.9%
1 23
 
< 0.1%
2 6
 
< 0.1%
3 1
 
< 0.1%
7 1
 
< 0.1%
5 1
 
< 0.1%
4 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
, 1
50.0%
. 1
50.0%
Modifier Symbol
ValueCountFrequency (%)
` 11
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 636987
71.2%
Common 257649
28.8%

Most frequent character per script

Common
ValueCountFrequency (%)
0 257602
> 99.9%
1 23
 
< 0.1%
` 11
 
< 0.1%
2 6
 
< 0.1%
3 1
 
< 0.1%
, 1
 
< 0.1%
7 1
 
< 0.1%
5 1
 
< 0.1%
. 1
 
< 0.1%
4 1
 
< 0.1%
Latin
ValueCountFrequency (%)
N 420288
66.0%
Y 201397
31.6%
T 15284
 
2.4%
R 14
 
< 0.1%
C 2
 
< 0.1%
A 1
 
< 0.1%
Q 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 894636
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 420288
47.0%
0 257602
28.8%
Y 201397
22.5%
T 15284
 
1.7%
1 23
 
< 0.1%
R 14
 
< 0.1%
` 11
 
< 0.1%
2 6
 
< 0.1%
C 2
 
< 0.1%
3 1
 
< 0.1%
Other values (8) 8
 
< 0.1%

LowDoc
Categorical

Distinct8
Distinct (%)< 0.1%
Missing2582
Missing (%)0.3%
Memory size6.9 MiB
N
782822 
Y
110335 
0
 
1491
C
 
758
S
 
603
Other values (3)
 
573

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters896582
Distinct characters8
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowY
2nd rowY
3rd rowN
4th rowY
5th rowN

Common Values

ValueCountFrequency (%)
N 782822
87.1%
Y 110335
 
12.3%
0 1491
 
0.2%
C 758
 
0.1%
S 603
 
0.1%
A 497
 
0.1%
R 75
 
< 0.1%
1 1
 
< 0.1%
(Missing) 2582
 
0.3%

Length

2023-06-09T14:33:39.526662image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-09T14:33:39.805439image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
n 782822
87.3%
y 110335
 
12.3%
0 1491
 
0.2%
c 758
 
0.1%
s 603
 
0.1%
a 497
 
0.1%
r 75
 
< 0.1%
1 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N 782822
87.3%
Y 110335
 
12.3%
0 1491
 
0.2%
C 758
 
0.1%
S 603
 
0.1%
A 497
 
0.1%
R 75
 
< 0.1%
1 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 895090
99.8%
Decimal Number 1492
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 782822
87.5%
Y 110335
 
12.3%
C 758
 
0.1%
S 603
 
0.1%
A 497
 
0.1%
R 75
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 1491
99.9%
1 1
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 895090
99.8%
Common 1492
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 782822
87.5%
Y 110335
 
12.3%
C 758
 
0.1%
S 603
 
0.1%
A 497
 
0.1%
R 75
 
< 0.1%
Common
ValueCountFrequency (%)
0 1491
99.9%
1 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 896582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 782822
87.3%
Y 110335
 
12.3%
0 1491
 
0.2%
C 758
 
0.1%
S 603
 
0.1%
A 497
 
0.1%
R 75
 
< 0.1%
1 1
 
< 0.1%

ChgOffDate
Categorical

HIGH CARDINALITY  MISSING 

Distinct6448
Distinct (%)4.0%
Missing736465
Missing (%)81.9%
Memory size6.9 MiB
13-Mar-10
 
734
20-Feb-10
 
614
30-Jan-10
 
519
6-Feb-10
 
461
6-Mar-10
 
422
Other values (6443)
159949 

Length

Max length9
Median length9
Mean length8.7163043
Min length8

Characters and Unicode

Total characters1418134
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique861 ?
Unique (%)0.5%

Sample

1st row24-Jun-91
2nd row18-Apr-02
3rd row4-Oct-89
4th row26-Jun-14
5th row4-Oct-05

Common Values

ValueCountFrequency (%)
13-Mar-10 734
 
0.1%
20-Feb-10 614
 
0.1%
30-Jan-10 519
 
0.1%
6-Feb-10 461
 
0.1%
6-Mar-10 422
 
< 0.1%
10-Jun-10 415
 
< 0.1%
20-Mar-10 414
 
< 0.1%
13-Feb-10 400
 
< 0.1%
7-Jun-10 350
 
< 0.1%
3-Jun-10 338
 
< 0.1%
Other values (6438) 158032
 
17.6%
(Missing) 736465
81.9%

Length

2023-06-09T14:33:40.078279image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
13-mar-10 734
 
0.5%
20-feb-10 614
 
0.4%
30-jan-10 519
 
0.3%
6-feb-10 461
 
0.3%
6-mar-10 422
 
0.3%
10-jun-10 415
 
0.3%
20-mar-10 414
 
0.3%
13-feb-10 400
 
0.2%
7-jun-10 350
 
0.2%
3-jun-10 338
 
0.2%
Other values (6438) 158032
97.1%

Most occurring characters

ValueCountFrequency (%)
- 325398
22.9%
1 177588
 
12.5%
0 126799
 
8.9%
2 83425
 
5.9%
u 48822
 
3.4%
9 46885
 
3.3%
J 44922
 
3.2%
a 43197
 
3.0%
8 38336
 
2.7%
e 37857
 
2.7%
Other values (23) 444905
31.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 604639
42.6%
Dash Punctuation 325398
22.9%
Lowercase Letter 325398
22.9%
Uppercase Letter 162699
 
11.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u 48822
15.0%
a 43197
13.3%
e 37857
11.6%
n 30637
9.4%
r 28866
8.9%
p 28398
8.7%
c 21231
6.5%
g 16046
 
4.9%
y 15627
 
4.8%
l 14285
 
4.4%
Other values (4) 40432
12.4%
Decimal Number
ValueCountFrequency (%)
1 177588
29.4%
0 126799
21.0%
2 83425
13.8%
9 46885
 
7.8%
8 38336
 
6.3%
3 37546
 
6.2%
6 28366
 
4.7%
7 23654
 
3.9%
4 22727
 
3.8%
5 19313
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
J 44922
27.6%
M 31051
19.1%
A 29488
18.1%
S 14956
 
9.2%
F 12352
 
7.6%
O 10682
 
6.6%
D 10549
 
6.5%
N 8699
 
5.3%
Dash Punctuation
ValueCountFrequency (%)
- 325398
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 930037
65.6%
Latin 488097
34.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
u 48822
 
10.0%
J 44922
 
9.2%
a 43197
 
8.9%
e 37857
 
7.8%
M 31051
 
6.4%
n 30637
 
6.3%
A 29488
 
6.0%
r 28866
 
5.9%
p 28398
 
5.8%
c 21231
 
4.3%
Other values (12) 143628
29.4%
Common
ValueCountFrequency (%)
- 325398
35.0%
1 177588
19.1%
0 126799
 
13.6%
2 83425
 
9.0%
9 46885
 
5.0%
8 38336
 
4.1%
3 37546
 
4.0%
6 28366
 
3.0%
7 23654
 
2.5%
4 22727
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1418134
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 325398
22.9%
1 177588
 
12.5%
0 126799
 
8.9%
2 83425
 
5.9%
u 48822
 
3.4%
9 46885
 
3.3%
J 44922
 
3.2%
a 43197
 
3.0%
8 38336
 
2.7%
e 37857
 
2.7%
Other values (23) 444905
31.4%

DisbursementDate
Categorical

Distinct8472
Distinct (%)0.9%
Missing2368
Missing (%)0.3%
Memory size6.9 MiB
31-Jul-95
 
10371
30-Apr-95
 
10320
31-Jan-95
 
9745
31-Oct-94
 
8890
31-Oct-95
 
8161
Other values (8467)
849309 

Length

Max length9
Median length9
Mean length8.9532067
Min length8

Characters and Unicode

Total characters8029200
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2548 ?
Unique (%)0.3%

Sample

1st row28-Feb-99
2nd row31-May-97
3rd row31-Dec-97
4th row30-Jun-97
5th row14-May-97

Common Values

ValueCountFrequency (%)
31-Jul-95 10371
 
1.2%
30-Apr-95 10320
 
1.1%
31-Jan-95 9745
 
1.1%
31-Oct-94 8890
 
1.0%
31-Oct-95 8161
 
0.9%
30-Apr-96 8085
 
0.9%
31-Jan-96 7363
 
0.8%
31-Mar-06 7033
 
0.8%
31-Mar-05 6810
 
0.8%
28-Feb-06 6769
 
0.8%
Other values (8462) 813249
90.4%

Length

2023-06-09T14:33:40.317017image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
31-jul-95 10371
 
1.2%
30-apr-95 10320
 
1.2%
31-jan-95 9745
 
1.1%
31-oct-94 8890
 
1.0%
31-oct-95 8161
 
0.9%
30-apr-96 8085
 
0.9%
31-jan-96 7363
 
0.8%
31-mar-06 7033
 
0.8%
31-mar-05 6810
 
0.8%
28-feb-06 6769
 
0.8%
Other values (8462) 813249
90.7%

Most occurring characters

ValueCountFrequency (%)
- 1793592
22.3%
0 859335
 
10.7%
3 816703
 
10.2%
1 685810
 
8.5%
9 389603
 
4.9%
J 256595
 
3.2%
a 222863
 
2.8%
u 221570
 
2.8%
e 185237
 
2.3%
p 170811
 
2.1%
Other values (23) 2427081
30.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3545220
44.2%
Dash Punctuation 1793592
22.3%
Lowercase Letter 1793592
22.3%
Uppercase Letter 896796
 
11.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 222863
12.4%
u 221570
12.4%
e 185237
10.3%
p 170811
9.5%
r 170171
9.5%
c 165181
9.2%
n 158131
8.8%
t 101803
5.7%
l 98464
5.5%
y 62981
 
3.5%
Other values (4) 236380
13.2%
Decimal Number
ValueCountFrequency (%)
0 859335
24.2%
3 816703
23.0%
1 685810
19.3%
9 389603
11.0%
2 142927
 
4.0%
8 140403
 
4.0%
5 135196
 
3.8%
6 132712
 
3.7%
7 126025
 
3.6%
4 116506
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
J 256595
28.6%
A 166698
18.6%
M 127146
14.2%
O 101803
 
11.4%
S 64805
 
7.2%
D 63378
 
7.1%
N 59317
 
6.6%
F 57054
 
6.4%
Dash Punctuation
ValueCountFrequency (%)
- 1793592
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5338812
66.5%
Latin 2690388
33.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
J 256595
 
9.5%
a 222863
 
8.3%
u 221570
 
8.2%
e 185237
 
6.9%
p 170811
 
6.3%
r 170171
 
6.3%
A 166698
 
6.2%
c 165181
 
6.1%
n 158131
 
5.9%
M 127146
 
4.7%
Other values (12) 845985
31.4%
Common
ValueCountFrequency (%)
- 1793592
33.6%
0 859335
16.1%
3 816703
15.3%
1 685810
 
12.8%
9 389603
 
7.3%
2 142927
 
2.7%
8 140403
 
2.6%
5 135196
 
2.5%
6 132712
 
2.5%
7 126025
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8029200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 1793592
22.3%
0 859335
 
10.7%
3 816703
 
10.2%
1 685810
 
8.5%
9 389603
 
4.9%
J 256595
 
3.2%
a 222863
 
2.8%
u 221570
 
2.8%
e 185237
 
2.3%
p 170811
 
2.1%
Other values (23) 2427081
30.2%
Distinct118859
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
$50,000.00
 
43787
$100,000.00
 
36714
$25,000.00
 
27387
$150,000.00
 
23373
$10,000.00
 
21328
Other values (118854)
746575 

Length

Max length15
Median length14
Mean length11.537586
Min length6

Characters and Unicode

Total characters10374182
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique79785 ?
Unique (%)8.9%

Sample

1st row$60,000.00
2nd row$40,000.00
3rd row$287,000.00
4th row$35,000.00
5th row$229,000.00

Common Values

ValueCountFrequency (%)
$50,000.00 43787
 
4.9%
$100,000.00 36714
 
4.1%
$25,000.00 27387
 
3.0%
$150,000.00 23373
 
2.6%
$10,000.00 21328
 
2.4%
$35,000.00 14748
 
1.6%
$5,000.00 14193
 
1.6%
$75,000.00 13528
 
1.5%
$20,000.00 13462
 
1.5%
$30,000.00 12696
 
1.4%
Other values (118849) 677948
75.4%

Length

2023-06-09T14:33:40.560221image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
50,000.00 43787
 
4.9%
100,000.00 36714
 
4.1%
25,000.00 27387
 
3.0%
150,000.00 23373
 
2.6%
10,000.00 21328
 
2.4%
35,000.00 14748
 
1.6%
5,000.00 14193
 
1.6%
75,000.00 13528
 
1.5%
20,000.00 13462
 
1.5%
30,000.00 12696
 
1.4%
Other values (118849) 677948
75.4%

Most occurring characters

ValueCountFrequency (%)
0 4457089
43.0%
, 924978
 
8.9%
$ 899164
 
8.7%
. 899164
 
8.7%
899164
 
8.7%
5 445569
 
4.3%
1 409947
 
4.0%
2 312909
 
3.0%
3 238773
 
2.3%
4 207077
 
2.0%
Other values (4) 680348
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6751712
65.1%
Other Punctuation 1824142
 
17.6%
Currency Symbol 899164
 
8.7%
Space Separator 899164
 
8.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4457089
66.0%
5 445569
 
6.6%
1 409947
 
6.1%
2 312909
 
4.6%
3 238773
 
3.5%
4 207077
 
3.1%
7 183883
 
2.7%
6 177786
 
2.6%
8 162618
 
2.4%
9 156061
 
2.3%
Other Punctuation
ValueCountFrequency (%)
, 924978
50.7%
. 899164
49.3%
Currency Symbol
ValueCountFrequency (%)
$ 899164
100.0%
Space Separator
ValueCountFrequency (%)
899164
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10374182
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4457089
43.0%
, 924978
 
8.9%
$ 899164
 
8.7%
. 899164
 
8.7%
899164
 
8.7%
5 445569
 
4.3%
1 409947
 
4.0%
2 312909
 
3.0%
3 238773
 
2.3%
4 207077
 
2.0%
Other values (4) 680348
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10374182
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4457089
43.0%
, 924978
 
8.9%
$ 899164
 
8.7%
. 899164
 
8.7%
899164
 
8.7%
5 445569
 
4.3%
1 409947
 
4.0%
2 312909
 
3.0%
3 238773
 
2.3%
4 207077
 
2.0%
Other values (4) 680348
 
6.6%

BalanceGross
Categorical

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
$0.00
899150 
$12,750.00
 
1
$827,875.00
 
1
$25,000.00
 
1
$37,100.00
 
1
Other values (10)
 
10

Length

Max length12
Median length6
Mean length6.0000767
Min length6

Characters and Unicode

Total characters5395053
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st row$0.00
2nd row$0.00
3rd row$0.00
4th row$0.00
5th row$0.00

Common Values

ValueCountFrequency (%)
$0.00 899150
> 99.9%
$12,750.00 1
 
< 0.1%
$827,875.00 1
 
< 0.1%
$25,000.00 1
 
< 0.1%
$37,100.00 1
 
< 0.1%
$43,127.00 1
 
< 0.1%
$84,617.00 1
 
< 0.1%
$1,760.00 1
 
< 0.1%
$115,820.00 1
 
< 0.1%
$996,262.00 1
 
< 0.1%
Other values (5) 5
 
< 0.1%

Length

2023-06-09T14:33:40.796311image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.00 899150
> 99.9%
12,750.00 1
 
< 0.1%
827,875.00 1
 
< 0.1%
25,000.00 1
 
< 0.1%
37,100.00 1
 
< 0.1%
43,127.00 1
 
< 0.1%
84,617.00 1
 
< 0.1%
1,760.00 1
 
< 0.1%
115,820.00 1
 
< 0.1%
996,262.00 1
 
< 0.1%
Other values (5) 5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2697490
50.0%
$ 899164
 
16.7%
. 899164
 
16.7%
899164
 
16.7%
, 13
 
< 0.1%
1 11
 
< 0.1%
7 8
 
< 0.1%
2 7
 
< 0.1%
6 7
 
< 0.1%
9 7
 
< 0.1%
Other values (4) 18
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2697548
50.0%
Other Punctuation 899177
 
16.7%
Currency Symbol 899164
 
16.7%
Space Separator 899164
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2697490
> 99.9%
1 11
 
< 0.1%
7 8
 
< 0.1%
2 7
 
< 0.1%
6 7
 
< 0.1%
9 7
 
< 0.1%
5 6
 
< 0.1%
8 5
 
< 0.1%
4 4
 
< 0.1%
3 3
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 899164
> 99.9%
, 13
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$ 899164
100.0%
Space Separator
ValueCountFrequency (%)
899164
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5395053
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2697490
50.0%
$ 899164
 
16.7%
. 899164
 
16.7%
899164
 
16.7%
, 13
 
< 0.1%
1 11
 
< 0.1%
7 8
 
< 0.1%
2 7
 
< 0.1%
6 7
 
< 0.1%
9 7
 
< 0.1%
Other values (4) 18
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5395053
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2697490
50.0%
$ 899164
 
16.7%
. 899164
 
16.7%
899164
 
16.7%
, 13
 
< 0.1%
1 11
 
< 0.1%
7 8
 
< 0.1%
2 7
 
< 0.1%
6 7
 
< 0.1%
9 7
 
< 0.1%
Other values (4) 18
 
< 0.1%

MIS_Status
Categorical

Distinct2
Distinct (%)< 0.1%
Missing1997
Missing (%)0.2%
Memory size6.9 MiB
P I F
739609 
CHGOFF
157558 

Length

Max length6
Median length5
Mean length5.1756172
Min length5

Characters and Unicode

Total characters4643393
Distinct characters8
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP I F
2nd rowP I F
3rd rowP I F
4th rowP I F
5th rowP I F

Common Values

ValueCountFrequency (%)
P I F 739609
82.3%
CHGOFF 157558
 
17.5%
(Missing) 1997
 
0.2%

Length

2023-06-09T14:33:41.053019image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-09T14:33:41.313479image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
p 739609
31.1%
i 739609
31.1%
f 739609
31.1%
chgoff 157558
 
6.6%

Most occurring characters

ValueCountFrequency (%)
1479218
31.9%
F 1054725
22.7%
P 739609
15.9%
I 739609
15.9%
C 157558
 
3.4%
H 157558
 
3.4%
G 157558
 
3.4%
O 157558
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3164175
68.1%
Space Separator 1479218
31.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 1054725
33.3%
P 739609
23.4%
I 739609
23.4%
C 157558
 
5.0%
H 157558
 
5.0%
G 157558
 
5.0%
O 157558
 
5.0%
Space Separator
ValueCountFrequency (%)
1479218
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3164175
68.1%
Common 1479218
31.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 1054725
33.3%
P 739609
23.4%
I 739609
23.4%
C 157558
 
5.0%
H 157558
 
5.0%
G 157558
 
5.0%
O 157558
 
5.0%
Common
ValueCountFrequency (%)
1479218
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4643393
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1479218
31.9%
F 1054725
22.7%
P 739609
15.9%
I 739609
15.9%
C 157558
 
3.4%
H 157558
 
3.4%
G 157558
 
3.4%
O 157558
 
3.4%

ChgOffPrinGr
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct83165
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
$0.00
737152 
$50,000.00
 
2110
$10,000.00
 
1865
$25,000.00
 
1371
$35,000.00
 
1345
Other values (83160)
155321 

Length

Max length14
Median length6
Mean length6.8997235
Min length6

Characters and Unicode

Total characters6203983
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique52342 ?
Unique (%)5.8%

Sample

1st row$0.00
2nd row$0.00
3rd row$0.00
4th row$0.00
5th row$0.00

Common Values

ValueCountFrequency (%)
$0.00 737152
82.0%
$50,000.00 2110
 
0.2%
$10,000.00 1865
 
0.2%
$25,000.00 1371
 
0.2%
$35,000.00 1345
 
0.1%
$100,000.00 1028
 
0.1%
$20,000.00 594
 
0.1%
$30,000.00 492
 
0.1%
$15,000.00 467
 
0.1%
$5,000.00 356
 
< 0.1%
Other values (83155) 152384
 
16.9%

Length

2023-06-09T14:33:41.567715image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.00 737152
82.0%
50,000.00 2110
 
0.2%
10,000.00 1865
 
0.2%
25,000.00 1371
 
0.2%
35,000.00 1345
 
0.1%
100,000.00 1028
 
0.1%
20,000.00 594
 
0.1%
30,000.00 492
 
0.1%
15,000.00 467
 
0.1%
5,000.00 356
 
< 0.1%
Other values (83155) 152384
 
16.9%

Most occurring characters

ValueCountFrequency (%)
0 2643222
42.6%
$ 899164
 
14.5%
. 899164
 
14.5%
899164
 
14.5%
, 161591
 
2.6%
1 98607
 
1.6%
2 88727
 
1.4%
4 86077
 
1.4%
9 81470
 
1.3%
3 79226
 
1.3%
Other values (4) 267571
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3344900
53.9%
Other Punctuation 1060755
 
17.1%
Currency Symbol 899164
 
14.5%
Space Separator 899164
 
14.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2643222
79.0%
1 98607
 
2.9%
2 88727
 
2.7%
4 86077
 
2.6%
9 81470
 
2.4%
3 79226
 
2.4%
5 71099
 
2.1%
8 66886
 
2.0%
7 65400
 
2.0%
6 64186
 
1.9%
Other Punctuation
ValueCountFrequency (%)
. 899164
84.8%
, 161591
 
15.2%
Currency Symbol
ValueCountFrequency (%)
$ 899164
100.0%
Space Separator
ValueCountFrequency (%)
899164
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6203983
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2643222
42.6%
$ 899164
 
14.5%
. 899164
 
14.5%
899164
 
14.5%
, 161591
 
2.6%
1 98607
 
1.6%
2 88727
 
1.4%
4 86077
 
1.4%
9 81470
 
1.3%
3 79226
 
1.3%
Other values (4) 267571
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6203983
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2643222
42.6%
$ 899164
 
14.5%
. 899164
 
14.5%
899164
 
14.5%
, 161591
 
2.6%
1 98607
 
1.6%
2 88727
 
1.4%
4 86077
 
1.4%
9 81470
 
1.3%
3 79226
 
1.3%
Other values (4) 267571
 
4.3%

GrAppv
Categorical

Distinct22128
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
$50,000.00
69394 
$25,000.00
 
51258
$100,000.00
 
50977
$10,000.00
 
38366
$150,000.00
 
27624
Other values (22123)
661545 

Length

Max length14
Median length12
Mean length11.513319
Min length8

Characters and Unicode

Total characters10352362
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13651 ?
Unique (%)1.5%

Sample

1st row$60,000.00
2nd row$40,000.00
3rd row$287,000.00
4th row$35,000.00
5th row$229,000.00

Common Values

ValueCountFrequency (%)
$50,000.00 69394
 
7.7%
$25,000.00 51258
 
5.7%
$100,000.00 50977
 
5.7%
$10,000.00 38366
 
4.3%
$150,000.00 27624
 
3.1%
$20,000.00 23434
 
2.6%
$35,000.00 23181
 
2.6%
$30,000.00 21004
 
2.3%
$5,000.00 19146
 
2.1%
$15,000.00 18472
 
2.1%
Other values (22118) 556308
61.9%

Length

2023-06-09T14:33:41.843827image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
50,000.00 69394
 
7.7%
25,000.00 51258
 
5.7%
100,000.00 50977
 
5.7%
10,000.00 38366
 
4.3%
150,000.00 27624
 
3.1%
20,000.00 23434
 
2.6%
35,000.00 23181
 
2.6%
30,000.00 21004
 
2.3%
5,000.00 19146
 
2.1%
15,000.00 18472
 
2.1%
Other values (22118) 556308
61.9%

Most occurring characters

ValueCountFrequency (%)
0 4946152
47.8%
, 925342
 
8.9%
$ 899164
 
8.7%
. 899164
 
8.7%
899164
 
8.7%
5 450225
 
4.3%
1 345271
 
3.3%
2 266534
 
2.6%
3 180629
 
1.7%
4 133995
 
1.3%
Other values (4) 406722
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6729528
65.0%
Other Punctuation 1824506
 
17.6%
Currency Symbol 899164
 
8.7%
Space Separator 899164
 
8.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4946152
73.5%
5 450225
 
6.7%
1 345271
 
5.1%
2 266534
 
4.0%
3 180629
 
2.7%
4 133995
 
2.0%
7 120134
 
1.8%
6 110952
 
1.6%
8 98042
 
1.5%
9 77594
 
1.2%
Other Punctuation
ValueCountFrequency (%)
, 925342
50.7%
. 899164
49.3%
Currency Symbol
ValueCountFrequency (%)
$ 899164
100.0%
Space Separator
ValueCountFrequency (%)
899164
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10352362
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4946152
47.8%
, 925342
 
8.9%
$ 899164
 
8.7%
. 899164
 
8.7%
899164
 
8.7%
5 450225
 
4.3%
1 345271
 
3.3%
2 266534
 
2.6%
3 180629
 
1.7%
4 133995
 
1.3%
Other values (4) 406722
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10352362
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4946152
47.8%
, 925342
 
8.9%
$ 899164
 
8.7%
. 899164
 
8.7%
899164
 
8.7%
5 450225
 
4.3%
1 345271
 
3.3%
2 266534
 
2.6%
3 180629
 
1.7%
4 133995
 
1.3%
Other values (4) 406722
 
3.9%

SBA_Appv
Categorical

Distinct38326
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
$25,000.00
 
49579
$12,500.00
 
40147
$5,000.00
 
31135
$50,000.00
 
25047
$10,000.00
 
17009
Other values (38321)
736247 

Length

Max length14
Median length11
Mean length11.308074
Min length8

Characters and Unicode

Total characters10167813
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23260 ?
Unique (%)2.6%

Sample

1st row$48,000.00
2nd row$32,000.00
3rd row$215,250.00
4th row$28,000.00
5th row$229,000.00

Common Values

ValueCountFrequency (%)
$25,000.00 49579
 
5.5%
$12,500.00 40147
 
4.5%
$5,000.00 31135
 
3.5%
$50,000.00 25047
 
2.8%
$10,000.00 17009
 
1.9%
$17,500.00 16141
 
1.8%
$15,000.00 14490
 
1.6%
$7,500.00 12781
 
1.4%
$127,500.00 11946
 
1.3%
$80,000.00 10965
 
1.2%
Other values (38316) 669924
74.5%

Length

2023-06-09T14:33:42.106801image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
25,000.00 49579
 
5.5%
12,500.00 40147
 
4.5%
5,000.00 31135
 
3.5%
50,000.00 25047
 
2.8%
10,000.00 17009
 
1.9%
17,500.00 16141
 
1.8%
15,000.00 14490
 
1.6%
7,500.00 12781
 
1.4%
127,500.00 11946
 
1.3%
80,000.00 10965
 
1.2%
Other values (38316) 669924
74.5%

Most occurring characters

ValueCountFrequency (%)
0 4048030
39.8%
, 908994
 
8.9%
$ 899164
 
8.8%
. 899164
 
8.8%
899164
 
8.8%
5 654346
 
6.4%
2 433556
 
4.3%
1 386969
 
3.8%
7 251493
 
2.5%
3 186643
 
1.8%
Other values (4) 600290
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6561327
64.5%
Other Punctuation 1808158
 
17.8%
Currency Symbol 899164
 
8.8%
Space Separator 899164
 
8.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4048030
61.7%
5 654346
 
10.0%
2 433556
 
6.6%
1 386969
 
5.9%
7 251493
 
3.8%
3 186643
 
2.8%
4 180754
 
2.8%
6 151450
 
2.3%
8 150215
 
2.3%
9 117871
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 908994
50.3%
. 899164
49.7%
Currency Symbol
ValueCountFrequency (%)
$ 899164
100.0%
Space Separator
ValueCountFrequency (%)
899164
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10167813
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4048030
39.8%
, 908994
 
8.9%
$ 899164
 
8.8%
. 899164
 
8.8%
899164
 
8.8%
5 654346
 
6.4%
2 433556
 
4.3%
1 386969
 
3.8%
7 251493
 
2.5%
3 186643
 
1.8%
Other values (4) 600290
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10167813
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4048030
39.8%
, 908994
 
8.9%
$ 899164
 
8.8%
. 899164
 
8.8%
899164
 
8.8%
5 654346
 
6.4%
2 433556
 
4.3%
1 386969
 
3.8%
7 251493
 
2.5%
3 186643
 
1.8%
Other values (4) 600290
 
5.9%

Interactions

2023-06-09T14:33:08.971289image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:51.148615image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:53.821667image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:56.414358image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:59.004521image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:01.528247image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:03.968209image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:06.318420image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:09.257113image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:51.486077image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:54.154302image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:56.737580image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:59.333411image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:01.818183image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:04.244569image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:06.653656image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:09.838461image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:51.802187image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:54.486670image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:57.038568image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:59.668441image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:02.129936image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:04.494923image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:07.009876image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:10.131824image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:52.125757image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:54.817864image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:57.293360image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:59.977645image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:02.438258image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:04.781670image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:07.352812image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:10.433006image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:52.444784image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:55.163165image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:57.558904image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:00.304509image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:02.745672image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:05.075974image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:07.708856image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:10.776225image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:52.783200image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:55.534275image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:57.823822image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:00.627306image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:03.042509image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:05.378103image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:08.065687image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:11.111563image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:53.121541image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:55.814617image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:58.144076image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:00.900943image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:03.348986image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:05.680486image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:08.396022image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:11.442190image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:53.469508image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:56.133682image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:32:58.459476image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:01.214908image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:03.652959image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:05.991978image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-09T14:33:08.682055image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-06-09T14:33:42.330151image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
LoanNr_ChkDgtZipNAICSTermNoEmpCreateJobRetainedJobFranchiseCodeStateBankStateApprovalFYNewExistUrbanRuralRevLineCrLowDocBalanceGrossMIS_Status
LoanNr_ChkDgt1.0000.031-0.0500.1210.075-0.031-0.1420.3920.0560.1130.7130.0620.1890.0840.1100.0010.237
Zip0.0311.000-0.0340.1420.0590.026-0.0260.0310.9970.6640.0570.0880.1260.0560.0590.0010.081
NAICS-0.050-0.0341.000-0.081-0.1540.1570.271-0.0910.1060.1230.2280.0940.4320.1240.0600.0010.148
Term0.1210.142-0.0811.0000.2000.082-0.1570.1960.1080.1470.1330.0880.2070.1400.0680.0000.492
NoEmp0.0750.059-0.1540.2001.0000.0340.1240.1210.0090.0050.0100.0050.0100.0000.0000.0000.004
CreateJob-0.0310.0260.1570.0820.0341.0000.377-0.0540.0060.0090.2260.0090.0250.0110.0030.0000.012
RetainedJob-0.142-0.0260.271-0.1570.1240.3771.000-0.2630.0050.0070.2090.0020.0250.0100.0030.0000.013
FranchiseCode0.3920.031-0.0910.1960.121-0.054-0.2631.0000.0300.0360.0220.0990.0130.0440.0140.0050.022
State0.0560.9970.1060.1080.0090.0060.0050.0301.0000.6400.0390.1070.2070.0610.0730.0000.111
BankState0.1130.6640.1230.1470.0050.0090.0070.0360.6401.0000.0670.1120.2710.1180.0950.0000.198
ApprovalFY0.7130.0570.2280.1330.0100.2260.2090.0220.0390.0671.0000.0970.6800.1670.1790.0000.366
NewExist0.0620.0880.0940.0880.0050.0090.0020.0990.1070.1120.0971.0000.0300.0650.1160.0000.022
UrbanRural0.1890.1260.4320.2070.0100.0250.0250.0130.2070.2710.6800.0301.0000.3480.1570.0020.211
RevLineCr0.0840.0560.1240.1400.0000.0110.0100.0440.0610.1180.1670.0650.3481.0000.0870.0000.146
LowDoc0.1100.0590.0600.0680.0000.0030.0030.0140.0730.0950.1790.1160.1570.0871.0000.0000.088
BalanceGross0.0010.0010.0010.0000.0000.0000.0000.0050.0000.0000.0000.0000.0020.0000.0001.0000.000
MIS_Status0.2370.0810.1480.4920.0040.0120.0130.0220.1110.1980.3660.0220.2110.1460.0880.0001.000

Missing values

2023-06-09T14:33:13.920085image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-06-09T14:33:18.461818image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-06-09T14:33:26.421246image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

LoanNr_ChkDgtNameCityStateZipBankBankStateNAICSApprovalDateApprovalFYTermNoEmpNewExistCreateJobRetainedJobFranchiseCodeUrbanRuralRevLineCrLowDocChgOffDateDisbursementDateDisbursementGrossBalanceGrossMIS_StatusChgOffPrinGrGrAppvSBA_Appv
01000014003ABC HOBBYCRAFTEVANSVILLEIN47711FIFTH THIRD BANKOH45112028-Feb-9719978442.00010NYNaN28-Feb-99$60,000.00$0.00P I F$0.00$60,000.00$48,000.00
11000024006LANDMARK BAR & GRILLE (THE)NEW PARISIN465261ST SOURCE BANKIN72241028-Feb-9719976022.00010NYNaN31-May-97$40,000.00$0.00P I F$0.00$40,000.00$32,000.00
21000034009WHITLOCK DDS, TODD M.BLOOMINGTONIN47401GRANT COUNTY STATE BANKIN62121028-Feb-97199718071.00010NNNaN31-Dec-97$287,000.00$0.00P I F$0.00$287,000.00$215,250.00
31000044001BIG BUCKS PAWN & JEWELRY, LLCBROKEN ARROWOK740121ST NATL BK & TR CO OF BROKENOK028-Feb-9719976021.00010NYNaN30-Jun-97$35,000.00$0.00P I F$0.00$35,000.00$28,000.00
41000054004ANASTASIA CONFECTIONS, INC.ORLANDOFL32801FLORIDA BUS. DEVEL CORPFL028-Feb-971997240141.07710NNNaN14-May-97$229,000.00$0.00P I F$0.00$229,000.00$229,000.00
51000084002B&T SCREW MACHINE COMPANY, INCPLAINVILLECT6062TD BANK, NATIONAL ASSOCIATIONDE33272128-Feb-971997120191.00010NNNaN30-Jun-97$517,000.00$0.00P I F$0.00$517,000.00$387,750.00
61000093009MIDDLE ATLANTIC SPORTS CO INCUNIONNJ7083WELLS FARGO BANK NATL ASSOCSD02-Jun-80198045452.00000NN24-Jun-9122-Jul-80$600,000.00$0.00CHGOFF$208,959.00$600,000.00$499,998.00
71000094005WEAVER PRODUCTSSUMMERFIELDFL34491REGIONS BANKAL81111828-Feb-9719978412.00010NYNaN30-Jun-98$45,000.00$0.00P I F$0.00$45,000.00$36,000.00
81000104006TURTLE BEACH INNPORT SAINT JOEFL32456CENTENNIAL BANKFL72131028-Feb-97199729722.00010NNNaN31-Jul-97$305,000.00$0.00P I F$0.00$305,000.00$228,750.00
91000124001INTEXT BUILDING SYS LLCGLASTONBURYCT6073WEBSTER BANK NATL ASSOCCT028-Feb-9719978432.00010NYNaN30-Apr-97$70,000.00$0.00P I F$0.00$70,000.00$56,000.00
LoanNr_ChkDgtNameCityStateZipBankBankStateNAICSApprovalDateApprovalFYTermNoEmpNewExistCreateJobRetainedJobFranchiseCodeUrbanRuralRevLineCrLowDocChgOffDateDisbursementDateDisbursementGrossBalanceGrossMIS_StatusChgOffPrinGrGrAppvSBA_Appv
8991549995423005LITWIN LIVERY SERVICES, INC.CAMPBELLOH44405JPMORGAN CHASE BANK NATL ASSOCIL027-Feb-9719976011.000100NNaN30-Sep-97$10,000.00$0.00P I F$0.00$10,000.00$5,000.00
8991559995453003FUTURE LEADERS CENTER, INC.SO. OZONE PARKNY11420FLUSHING BANKNY62441027-Feb-97199718021.000100NNaN30-Jun-97$123,000.00$0.00P I F$0.00$128,000.00$96,000.00
8991569995473009FABRICATORS STEEL, INC.BALTIMOREMD21224BANK OF AMERICA NATL ASSOCMD33243127-Feb-97199760201.000100NNaN30-Jun-97$50,000.00$0.00P I F$0.00$50,000.00$25,000.00
8991579995493004PULLTARPS MFG.EL CAJONCA92020U.S. BANK NATIONAL ASSOCIATIONCA31491227-Feb-97199736401.00010NNNaN31-Mar-97$200,000.00$0.00P I F$0.00$200,000.00$150,000.00
8991589995563001SHADES WINDOW TINTING AUTO ALAIRVINGTX75062LOANS FROM OLD CLOSED LENDERSDC027-Feb-9719978452.00010NYNaN30-Jun-97$79,000.00$0.00P I F$0.00$79,000.00$63,200.00
8991599995573004FABRIC FARMSUPPER ARLINGTONOH43221JPMORGAN CHASE BANK NATL ASSOCIL45112027-Feb-9719976061.000100NNaN30-Sep-97$70,000.00$0.00P I F$0.00$70,000.00$56,000.00
8991609995603000FABRIC FARMSCOLUMBUSOH43221JPMORGAN CHASE BANK NATL ASSOCIL45113027-Feb-9719976061.00010YNNaN31-Oct-97$85,000.00$0.00P I F$0.00$85,000.00$42,500.00
8991619995613003RADCO MANUFACTURING CO.,INC.SANTA MARIACA93455RABOBANK, NATIONAL ASSOCIATIONCA33232127-Feb-971997108261.00010NNNaN30-Sep-97$300,000.00$0.00P I F$0.00$300,000.00$225,000.00
8991629995973006MARUTAMA HAWAII, INC.HONOLULUHI96830BANK OF HAWAIIHI027-Feb-9719976061.00010NY8-Mar-0031-Mar-97$75,000.00$0.00CHGOFF$46,383.00$75,000.00$60,000.00
8991639996003010PACIFIC TRADEWINDS FAN & LIGHTKAILUAHI96734CENTRAL PACIFIC BANKHI027-Feb-9719974812.00010NNNaN31-May-97$30,000.00$0.00P I F$0.00$30,000.00$24,000.00